Data Model (GIS)
   HOME

TheInfoList



OR:

A geographic data model, geospatial data model, or simply data model in the context of
geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
s, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be co ...
s represent various aspects of these phenomena by means of
geographic data Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position). It is also call ...
, including spatial
location In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
s, attributes, change over time, and identity. For example, the
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
data model represents geography as collections of points, lines, and polygons, and the
raster Raster may refer to: * Raster graphics, graphical techniques using arrays of pixel values * Raster graphics editor, a computer program * Raster scan, the pattern of image readout, transmission, storage, and reconstruction in television and compu ...
data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and
spatial analysis Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early deve ...
, data stored in a variety of
GIS file formats A GIS file format is a standard of encoding geographical information into a computer file. They are created mainly by government mapping agencies (such as the USGS or National Geospatial-Intelligence Agency) or by GIS software developers. Raster ...
, specifications and standards, and specific designs for GIS installations. While the unique nature of spatial information has led to its own set of model structures, much of the process of
data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to suppo ...
is similar to the rest of information technology, including the progression from
conceptual models Conceptual may refer to: Philosophy and Humanities *Concept *Conceptualism * Philosophical analysis (Conceptual analysis) *Theoretical definition (Conceptual definition) *Thinking about Consciousness (Conceptual dualism) *Pragmatism (Conceptual p ...
to logical models to
physical models A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
, and the difference between generic models and application-specific designs.


History

The earliest computer systems that represented geographic phenomena were quantitative analysis models developed during the
quantitative revolution The quantitative revolution (QR) was a paradigm shift that sought to develop a more rigorous and systematic methodology for the discipline of geography. It came as a response to the inadequacy of regional geography to explain general spatial dynam ...
in
geography Geography (from Greek: , ''geographia''. Combination of Greek words ‘Geo’ (The Earth) and ‘Graphien’ (to describe), literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and ...
in the 1950s and 1960s; these could not be called a geographic information system because they did not attempt to store geographic data in a consistent permanent structure, but were usually statistical or mathematical models. The first true GIS software modeled spatial information using data models that would come to be known as raster or vector: * SYMAP (by Howard Fisher,
Harvard Laboratory for Computer Graphics and Spatial Analysis The Harvard Laboratory for Computer Graphics and Spatial Analysis (1965 to 1991) pioneered early cartographic and architectural computer applications that led to integrated geographic information systems (GIS). Some of the Laboratory's influenti ...
, developed 1963–1967) produced raster maps, although data was usually entered as vector-like region outlines or sample points then interpolated into a raster structure for output. The GRID package, developed at the lab in 1969 by David Sinton, was based on SYMAP but was more focused on the permanent storage and analysis of gridded data, thus becoming perhaps the first general purpose raster GIS software. * The
Canadian Geographic Information System {{Unreferenced, date=October 2012 The Canada Geographic Information System (CGIS) was an early geographic information system (GIS) developed for the Government of Canada beginning in the early 1960s. CGIS was used to store geospatial data for th ...
(by
Roger Tomlinson Roger F. Tomlinson, (17 November 1933 – 7 February 2014) was an English-Canadian geographer and the primary originator of modern geographic information systems (GIS), and has been acknowledged as the "father of GIS." Biography Dr. Tomlinso ...
, Canada Land Inventory, developed 1963–1968) stored natural resource data as "faces" (vector polygons), although these were typically derived from raster scans of paper maps. *
Dual Independent Map Encoding Dual Independent Map Encoding (DIME) is an encoding scheme developed by the US Bureau of the Census for efficiently storing geographical data. The committee behind the case study that eventually resulted in DIME was established in 1965, although the ...
(DIME,
US Census Bureau The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureau is part of the ...
, 1967) was perhaps the first robust vector data model incorporating network and polygon
topology In mathematics, topology (from the Greek language, Greek words , and ) is concerned with the properties of a mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformations, such ...
and attributes sufficient to allow
address geocoding Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a locati ...
. * Like the CGIS, early GIS installations in the United States were often focused on inventories of land use and natural resources, including the Minnesota Land Management Information System (MLMIS, 1969), the Land Use and Natural Resources Inventory of New York (LUNR, 1970), and the Oak Ridge Regional Modelling Information System (ORRMIS, 1973). Unlike CGIS, these were all raster systems inspired by SYMAP, although the MLMIS was based on subsections of the
Public Land Survey System The Public Land Survey System (PLSS) is the surveying method developed and used in the United States to plat, or divide, real property for sale and settling. Also known as the Rectangular Survey System, it was created by the Land Ordinance of 1 ...
, which is not a perfect regular grid. Most first-generation GIS were custom-built for specific needs, with data models designed to be stored and processed most efficiently using the technology limitations of the day (especially
punched card A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
s and limited mainframe processing time). During the 1970s, the early systems had produced sufficient results to compare them and evaluate the effectiveness of their underlying data models. This led to efforts at the Harvard Lab and elsewhere focused on developing a new generation of
generic data model Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type. Overview The definition of generic data model i ...
s, such as the POLYVRT topological vector model that would form the basis for commercial software and data such as the
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
Coverage. As
commercial off-the-shelf Commercial off-the-shelf or commercially available off-the-shelf (COTS) products are packaged or canned (ready-made) hardware or software, which are adapted aftermarket to the needs of the purchasing organization, rather than the commissioning of ...
GIS software, GIS installations, and GIS data proliferated in the 1980s, scholars began to look for conceptual models of geographic phenomena that seemed to underlay the common data models, trying to discover why the raster and vector data models seemed to make common sense, and how they measured and represented the real world. This was one of the primary threads that formed the subdiscipline of
geographic information science Geographic information science or geographical information science (GIScience or GISc) is the scientific discipline that studies geographic information, including how it represents phenomena in the real world, how it represents the way humans unders ...
in the early 1990s. Further developments in GIS data modeling in the 1990s were driven by rapid increases in both the GIS user base and computing capability. Major trends included 1) the development of extensions to the traditional data models to handle more complex needs such as time, three-dimensional structures, uncertainty, and multimedia; and 2) the need to efficiently manage exponentially increasing volumes of spatial data with enterprise needs for multiuser access and security. These trends eventually culminated in the emergence of
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s incorporated into
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s and object-relational databases.


Types of data models

Because the world is much more complex than can be represented in a computer, all geospatial data are incomplete approximations of the world. Thus, most geospatial data models encode some form of strategy for collecting a finite ''sample'' of an often infinite domain, and a ''structure'' to organize the sample in such a way as to enable ''interpolation'' of the nature of the unsampled portion. For example, a building consists of an infinite number of points in space; a vector polygon represents it with a few ordered points, which are connected into a closed outline by straight lines and assuming all interior points are part of the building; furthermore, a "height" attribute may be the only representation of its three-dimensional volume. The process of designing geospatial data models is similar to
data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to suppo ...
in general, at least in its overall pattern. For example, it can be segmented into three distinct levels of model abstraction: *
Conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
, a high-level specification of how information is organized in the mind and in enterprise processes, without regard to the restrictions of GIS and other computer systems. It is common to develop and represent a conceptual model visually using tools such as an entity-relationship model. *
Logical data model A logical data model or logical schema is a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational ta ...
, a broad strategy for how to represent the conceptual model in the computer, sometimes novel but often within the framework of existing software, hardware, and standards. The
unified modeling language The Unified Modeling Language (UML) is a general-purpose, developmental modeling language in the field of software engineering that is intended to provide a standard way to visualize the design of a system. The creation of UML was originally m ...
(UML), specifically the
class diagram In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or methods), and the rela ...
, is commonly used for visually developing logical and physical models. *
Physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, tho ...
, the detailed specification of how data will be structured in memory or in files. Each of these models can be designed in one of two situations or ''scopes'': * A
generic data model Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type. Overview The definition of generic data model i ...
is intended to be employed in a wide variety applications, by discovering consistent patterns in the ways that society in general conceptualizes information and/or structures that work most efficiently in computers. For example, the
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
is a generic conceptual model of geographic phenomena, the
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
model and
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
are generic logical models, while the
shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
format is a generic physical model. These models are typically implemented directly info software and GIS file formats. In the past, these models have been designed by academic researchers, by standards bodies such as the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
, and by software vendors such as
Esri Esri (; Environmental Systems Research Institute) is an American multinational geographic information system (GIS) software company. It is best known for its ArcGIS products. With a 43% market share, Esri is the world's leading supplier of GIS ...
. While academic and standard models are public (and sometimes
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
), companies may choose to keep the details of their model a secret (as Esri attempted to do with the coverage and the file geodatabase) or to publish them openly (as Esri did with the shapefile). * A specific data model or GIS design is a specification of the data needed for a particular enterprise or project GIS application. It is generally created within the constraints of chosen generic data models, so that existing GIS software can be used. For example, a data model for a city would include a list of data layers to be included (e.g., roads, buildings, parcels, zoning), with each being specified with the type of generic spatial data model being used (e.g. raster or vector), choices of parameters such as coordinate system, and its attribute columns.


Conceptual spatial models

Generic geospatial conceptual models attempt to capture both the physical nature of geographic phenomena and how people think about them and work with them. Contrary to the standard modeling process described above, the data models upon which GIS is built were not originally designed based on a general conceptual model of geographic phenomena, but were largely designed according to technical expediency, likely influenced by common sense conceptualizations that had not yet been documented. That said, an early conceptual framework that was very influential in early GIS development was the recognition by
Brian Berry Brian Joe Lobley Berry (born February 16, 1934) is a British-American human geographer and city and regional planning (disambiguation), city and regional planner. He is Lloyd Viel Berkner Regental Professor in the School of Economic, Political an ...
and others that geographic information can be decomposed into the description of three very different aspects of each phenomenon: space, time, and attribute/property/theme. As a further development in 1978, David Sinton presented a framework that characterized different strategies for measurement, data, and mapping as holding one of the three aspects constant, controlling a second, and measuring the third. During the 1980s and 1990s, a body of spatial information theories gradually emerged as a major subfield of
geographic information science Geographic information science or geographical information science (GIScience or GISc) is the scientific discipline that studies geographic information, including how it represents phenomena in the real world, how it represents the way humans unders ...
, incorporating elements of
philosophy Philosophy (from , ) is the systematized study of general and fundamental questions, such as those about existence, reason, knowledge, values, mind, and language. Such questions are often posed as problems to be studied or resolved. Some ...
(especially
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
),
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
, and sciences of
spatial cognition Spatial cognition is the acquisition, organization, utilization, and revision of knowledge about spatial environments. It is most about how animals including humans behave within space and the knowledge they built around it, rather than space itse ...
. By the early 1990s, a basic dichotomy had emerged of two alternative ways of making sense of the world and its contents: * An ''
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
'' (also called a ''feature'' or ''entity'') is a distinct "thing," comprehended as a whole. It may be a visible, material object, such as a building or road, or an abstract entity such as a county or the
market area A market area is a geographic zone containing the people who are likely to purchase a firm's goods or services.Wade, T. and Sommer, S. eds. A to Z GIS' See also * GIS * Media market A media market, broadcast market, media region, designate ...
of a retail store. * A ''
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
'' is a property that varies over space, so that it potentially has a distinct measurable value at any location within its extent. It may be a physical, directly measurable characteristic of matter akin to the
intensive properties Physical properties of materials and systems can often be categorized as being either intensive or extensive, according to how the property changes when the size (or extent) of the system changes. According to IUPAC, an intensive quantity is one ...
of chemistry, such as temperature or density; or it may be an abstract concept defined via a mathematical model, such as the likelihood that a person living at each location will use a local park. These two conceptual models are not meant to represent different phenomena, but often are different ways of conceptualizing and describing the same phenomenon. For example, a lake is an object, but the temperature, clarity, and proportion of pollution of the water in the lake are each fields (the water itself may be considered as a third concept of a ''mass'', but this is not as widely accepted as objects and fields).


Vector data model

The vector logical model represents each geographic location or phenomenon by a geometric shape and a set of values for its attributes. Each geometric shape is represented using
coordinate geometry In classical mathematics, analytic geometry, also known as coordinate geometry or Cartesian geometry, is the study of geometry using a coordinate system. This contrasts with synthetic geometry. Analytic geometry is used in physics and engineerin ...
, by a structured set of coordinates (x,y) in a
geographic coordinate system The geographic coordinate system (GCS) is a spherical or ellipsoidal coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the various ...
, selected from a set of available
geometric primitive In vector computer graphics, CAD systems, and geographic information systems, geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that ...
s, such as points, lines, and polygons. Although there are dozens of vector file formats (i.e., physical data models) used in various GIS software, most conform to the Simple Feature Access (SFA) specification from the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
(OGC). It was developed in the 1990s by finding common ground between existing vector models, and is now enshrined as ISO 19125, the reference standard for the vector data model. OGC-SFA includes the following vector
geometric primitive In vector computer graphics, CAD systems, and geographic information systems, geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that ...
s: * ''Point'': a single coordinate in two- or three-dimensional space. Many vector formats allow a single feature to consist of several isolated points (a ''MultiPoint'' in OGC-SFA). * ''Curve'' (alternatively called a ''polyline'' or ''linestring''): a line includes an infinite number of points, but it is represented by a finite ordered sample of points (called ''vertices''), allowing for software to interpolate the intervening points. Traditionally, this was a linear interpolation (OGC-SFA calls this case a ''LineString''), but some vector formats allow for curves (usually
circular arc Circular may refer to: * The shape of a circle * ''Circular'' (album), a 2006 album by Spanish singer Vega * Circular letter (disambiguation) ** Flyer (pamphlet), a form of advertisement * Circular reasoning, a type of logical fallacy * Circular ...
s or
Bézier curve A Bézier curve ( ) is a parametric curve used in computer graphics and related fields. A set of discrete "control points" defines a smooth, continuous curve by means of a formula. Usually the curve is intended to approximate a real-world shape t ...
s), or for a single feature to consist of multiple disjoint curves (a ''MultiCurve'' in OGC-SFA). * ''Polygon'': a region also includes an infinite number of points, so the vector model represents its boundary as a closed line (called a ''ring'' in OGC-SFA), allowing the software to interpolate the interior. GIS software distinguishes the interior and the exterior by requiring that the line be ordered counter-clockwise, so the interior is always on the left side of the boundary. In nearly every format, a polygon can have "holes" (e.g., an island in a lake) by including interior rings, each in clockwise order (so the interior is still on the left). As with lines, curved boundaries may be allowed; usually a single feature may include multiple polygons, which OGC-SFA collectively terms a ''surface''. * ''
Text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
'' (alternatively called ''annotation''): a minority of vector data formats, including the Esri geodatabase and
Autodesk Autodesk, Inc. is an American multinational software corporation that makes software products and services for the architecture, engineering, construction, manufacturing, media, education, and entertainment industries. Autodesk is headquartered ...
.dwg DWG (from ''drawing'') is a proprietary binary file format used for storing two- and three- dimensional design data and metadata. It is the native format for several CAD packages including DraftSight, AutoCAD, BricsCAD, IntelliCAD (and i ...
, support the storage of text in the database. An annotation is usually represented as a point or curve (the ''baseline'') with a set of attributes giving the text content and design characteristics (font, size, spacing, etc.). The geometric shape stored in a vector data set representing a phenomenon may or may not be of the same
dimension In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...
as the real-world phenomenon itself. It is common to represent a feature by a lower dimension than its real nature, based on the scale and purpose of the representation. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional structure) may be represented as a line. As long as the user is aware that the latter is a representation choice and a road is not really a line, this generalization can be useful for applications such as
transport network analysis A transport network, or transportation network, is a network or graph in geographic space, describing an infrastructure that permits and constrains movement or flow. Examples include but are not limited to road networks, railways, air routes ...
. Based on this basic strategy of geometric shapes and attributes, vector data models use a variety of structures to collect these into a single data set (often called a ''layer''), usually containing a set of related features (e.g., roads). These can be categorized into several approaches: * The '' georelational data model'' was the basis for most early vector GIS software. The geometric data and the attribute data are stored separately; this was originally because the geometric data required GIS-specific code to process it, but existing
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
software (RDBMS) could be used to manage the attributes. For example, Esri ARC/INFO (later
ArcInfo ArcInfo (formerly ARC/INFO) is a full-featured geographic information system produced by Esri, and is the highest level of licensing (and therefore functionality) in the ArcGIS Desktop product line. It was originally a command-line based system. T ...
) was originally composed of two separate programs: ARC was written by Esri for spatial management and analysis, while INFO was a licensed commercial RDBMS program. It was termed "georelational" because in keeping with the principles of relational databases, the geometry and attributes could be joined by matching each shape with a row in the table using a ''key'', such as the row number or an ID number. * The ''
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
'' (also called the ''object-based model'') first appeared in the 1990s. It also leverages the maturity of
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s, especially for their ability to manage extremely large enterprise databases. Instead of storing geometric data separately, the spatial database defines a geometry data type, allowing the shapes to be stored in a column in the same table as the attributes, creating a single unified data set for each layer. Most RDBMS software (both commercial and open-source) have spatial extensions to enable the storage and query of geometric data, usually based on the Simple Features-SQL standard from the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
. Some non-database data formats also integrate geometric and attribute data for each object into a single structure, such as
GeoJSON GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JSON format. The features include points (therefore addresses and locations), line strings ( ...
. Vector data structures can also be classified by how they manage topological relationships between objects in a dataset: * A ''topological data model'' incorporates topological relationships as a core part of the model design. The GBF/DIME format from the U.S. Census Bureau was probably the first topological data model; another early example was POLYVRT, developed at the
Harvard Laboratory for Computer Graphics and Spatial Analysis The Harvard Laboratory for Computer Graphics and Spatial Analysis (1965 to 1991) pioneered early cartographic and architectural computer applications that led to integrated geographic information systems (GIS). Some of the Laboratory's influenti ...
in the 1970s, eventually evolving into the Esri ARC/INFO Coverage format. In this structure, lines are broken at all intersection points; these ''nodes'' can then store topological information about which lines connect there. Polygons are not stored separately, but are defined as a set of lines that collectively close. Each line contains information about the polygons on its right and left, thus explicitly storing topological adjacency. This structure was designed to enable composite line-polygon structures (e.g., the census block),
address geocoding Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a locati ...
, and
transport network analysis A transport network, or transportation network, is a network or graph in geographic space, describing an infrastructure that permits and constrains movement or flow. Examples include but are not limited to road networks, railways, air routes ...
. It also had the benefit of increased storage efficiency and reduced error, because the shared border of each pair of adjacent polygons was only digitized once. However, it is a fairly complicated data structure. Almost all topological data models are also geo-relational. * A ''spaghetti data model'' does not include any information about topology (so-called because the individual strands in a bowl of spaghetti may overlap without connecting). It was common in early GIS systems such as the
Map Overlay and Statistical System {{No footnotes, date=August 2011 The Map Overlay and Statistical System (MOSS), is a GIS software technology. Development of MOSS began in late 1977 and was first deployed for use in 1979. MOSS represents a very early public domain, open source GIS ...
(MOSS) as well as most recent data formats, such as the Esri
shapefile The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software product ...
, geography markup language (GML), and almost all
spatial database A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
s. In this model, each feature geometry is encoded separately from any others in the data set, regardless of whether they may be topologically related. For example, the shared boundary between two adjacent regions would be duplicated in each polygon shape. Despite the increased data volume and potential for error over topological data, this model has dominated GIS since 2000, largely due to its conceptual simplicity. Some GIS software has tools for validating topological integrity rules (e.g. not allowing polygons to overlap or have gaps) on spaghetti data to prevent and/or correct topological errors. * A ''hybrid topological data model'' has the option of storing topological relationship information as a separate layer built on top of a spaghetti data set. An example is the network dataset within the Esri
geodatabase A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
. Vector data are commonly used to represent conceptual
objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
(e.g., trees, buildings, counties), but they can also represent
fields Fields may refer to: Music *Fields (band), an indie rock band formed in 2006 *Fields (progressive rock band), a progressive rock band formed in 1971 * ''Fields'' (album), an LP by Swedish-based indie rock band Junip (2010) * "Fields", a song by ...
. As an example of the latter, a temperature field could be represented by an irregular sample of points (e.g., weather stations), or by '' isotherms'', a sample of lines of equal temperature.


Raster data model

The raster logical model represents a
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
using a
tessellation A tessellation or tiling is the covering of a surface, often a plane (mathematics), plane, using one or more geometric shapes, called ''tiles'', with no overlaps and no gaps. In mathematics, tessellation can be generalized to high-dimensional ...
of geographic space into a regularly spaced two-dimensional array of locations (each called a ''cell''), with a single attribute value for each cell (or more than one value in a multi-band raster). Typically, each cell either represents a single central point sample (in which the measurement model for the entire raster is called a ''lattice'') or it represents a summary (usually the mean) of the field variable over the square area (in which the model is called a ''grid''). The general data model is essentially the same as that used for images and other
raster graphics upright=1, The Smiley, smiley face in the top left corner is a raster image. When enlarged, individual pixels appear as squares. Enlarging further, each pixel can be analyzed, with their colors constructed through combination of the values for ...
, with the addition of capabilities for the geographic context. A small example follows: To represent a raster grid in a computer file, it must be serialized into a single (one-dimensional) list of values. While there are various possible ordering schemes, the most commonly used is ''row-major'', in which the cells in the first row, followed immediately by the cells in the second row, as follows: 6 7 10 9 8 6 7 8 6 8 9 10 8 7 7 7 7 8 9 10 9 8 7 6 8 8 9 11 10 9 9 7 . . . To reconstruct the original grid, a header is required with general parameters for the grid. At the very least, it requires the number of rows in each column so it will know where to begin each new row, and the datatype of each value (i.e. the number of bits in each value before beginning the next value). While the raster model is closely tied to the field conceptual model, objects can also be represented in raster, essentially by transforming an object ''X'' into a discrete ( Boolean) field of ''presence/absence of X''. Alternatively, a layer of objects (usually polygons) could be transformed into a discrete field of object identifiers. In this case, some raster file formats allow a vector-like table of attributes to be joined to the raster by matching the ID values. Raster representations of objects are often temporary, only created and used as part of a modelling procedure, rather than in a permanent data store. To be useful in GIS, a raster file must be georeferenced to correspond to real world locations, as a raw raster can only express locations in terms of rows and columns. This is typically done with a set of
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
parameters, either in the file header (such as the
GeoTIFF GeoTIFF is a public domain metadata standard which allows georeferencing information to be embedded within a TIFF file. The potential additional information includes map projection, coordinate systems, ellipsoids, datums, and everything else necessa ...
format) or in a
sidecar file Sidecar files, also known as buddy files or connected files, are computer files that store data (often metadata) which is not supported by the format of a source file. There may be one or more sidecar files for each source file. There may also be ...
(such as a world file). At the very least, the georeferencing
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
must include the location of at least one cell in the chosen coordinate system and the ''resolution'' or ''cell size'', the distance between each cell. A linear
Affine transformation In Euclidean geometry, an affine transformation or affinity (from the Latin, ''affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More generally, ...
is the most common type of georeferencing, allowing rotation and rectangular cells. More complex georeferencing schemes include polynomial and spline transformations. Raster data sets can be very large, so
image compression Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior r ...
techniques are often used. Compression algorithms identify spatial patterns in the data, then transform the data into parameterized representations of the patterns, from which the original data can be reconstructed. In most GIS applications,
lossless compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
algorithms (e.g., Lempel-Ziv) are preferred over lossy ones (e.g.,
JPEG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...
), because the complete original data are needed, not an interpolation.


Extensions

Starting in the 1990s, as the original data models and GIS software matured, one of the primary foci of data modeling research was on developing extensions to the traditional models to handle more complex geographic information.


Spatiotemporal models

Time has always played an important role in analytical geography, dating at least back to
Brian Berry Brian Joe Lobley Berry (born February 16, 1934) is a British-American human geographer and city and regional planning (disambiguation), city and regional planner. He is Lloyd Viel Berkner Regental Professor in the School of Economic, Political an ...
's regional science matrix (1964) and the
time geography Time geography or time-space geography is an evolving transdisciplinary perspective on spatial and temporal processes and events such as social interaction, ecological interaction, social and environmental change, and biographies of individuals. T ...
of
Torsten Hägerstrand Torsten Hägerstrand (October 11, 1916, in Moheda – May 3, 2004, in Lund) was a Sweden, Swedish geographer. He is known for his work on Human migration, migration, cultural diffusion and time geography. A native and resident of Sweden, Hägers ...
(1970). In the dawn of the
GIScience Geographic information science or geographical information science (GIScience or GISc) is the scientific discipline that studies geographic information, including how it represents phenomena in the real world, how it represents the way humans unders ...
era of the early 1990s, the work of Gail Langran opened the doors to research into methods of explicitly representing change over time in GIS data; this led to many conceptual and data models emerging in the decades since. Some forms of temporal data began to be supported in off-the-shelf GIS software by 2010. Several common models for representing time in vector and raster GIS data include: * The ''snapshot'' model (also known as ''time-stamped layers''), in which an entire dataset is tied to a particular valid time. That is, it is a "snapshot" of the world at that time. * ''Time-stamped features'', in which the dataset includes features valid at a variety of times, with each feature stamped by the time during which it was valid (i.e., by "start date" and "end date" columns in the attribute table.). Some GIS software, such as ArcGIS Pro, natively supports this model, with functionality including animation. * ''Time-stamped boundaries'', using the topological vector data model to decompose polygons into boundary segments, and stamping each segment by the time during which it was valid. This method was pioneered by the
Great Britain Historical GIS The Great Britain Historical GIS (or GBHGIS) is a Spatial Database, spatially enabled database that documents and visualises the changing human geography of the British Isles, although is primarily focussed on the subdivisions of the United Kingdom ...
. * ''Time-stamped facts'', in which each individual datum (including attribute values) can have its own time stamp, allowing for the attributes within a single feature to change over time, or for a single feature (with constant identity) to have different geometric shapes at different times. * ''Time as dimension'', which treats time as another (3rd or 4th) spatial dimension, and using multidimensional vector or raster structures to create geometries incorporating time. Hägerstrand visualized his time geography this way, and some GIS models based on it use this approach. The
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata ...
format supports managing temporal raster data as a dimension.


Three-dimensional models

There are several approaches for representing three-dimensional map information, and for managing it in the
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be co ...
. Some of these were developed specifically for GIS, while others have been adopted from
3D computer graphics 3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for th ...
or
computer-aided drafting Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve co ...
(CAD). * '' Height fields'' (also known as "2 1/2 dimensional surfaces") model three-dimensional phenomena by a single functional surface, in which elevation is a function of two-dimensional location, allowing it to be represented using field techniques such as isolated points,
contour line A contour line (also isoline, isopleth, or isarithm) of a function of two variables is a curve along which the function has a constant value, so that the curve joins points of equal value. It is a plane section of the three-dimensional grap ...
s, raster (the
digital elevation model A digital elevation model (DEM) or digital surface model (DSM) is a 3D computer graphics representation of elevation data to represent terrain or overlaying objects, commonly of a planet, moon, or asteroid. A "global DEM" refers to a discrete gl ...
), and
triangulated irregular network In computer graphics, a triangulated irregular network (TIN) is a representation of a continuous surface consisting entirely of triangular facets (a triangle mesh), used mainly as Discrete Global Grid in primary elevation modeling. The vertic ...
s. * A ''
polygon mesh In 3D computer graphics and solid modeling, a polygon mesh is a collection of , s and s that defines the shape of a polyhedral object. The faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polyg ...
'' (related to the mathematical
polyhedron In geometry, a polyhedron (plural polyhedra or polyhedrons; ) is a three-dimensional shape with flat polygonal faces, straight edges and sharp corners or vertices. A convex polyhedron is the convex hull of finitely many points, not all on th ...
) is a logical extension of the vector data model, and is probably the 3-D model type most widely supported in GIS. A volumetric object is reduced to its outer surface, which is represented by a set of polygons (often triangles) that collectively completely enclose a volume. * The ''
voxel In 3D computer graphics, a voxel represents a value on a regular grid in three-dimensional space. As with pixels in a 2D bitmap, voxels themselves do not typically have their position (i.e. coordinates) explicitly encoded with their values. Ins ...
'' model is the logical extension of the raster data model, by tessellating three-dimensional space into cubes called ''voxels'' (a portmanteau of ''volume'' and ''pixel'', the latter being itself a portmanteau).
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata ...
is one of the most common data formats that supports 3-D cells. * ''Vector-based stack-unit'' maps depict the vertical succession of geologic units to a specified depth (here, the base of the block diagram). This mapping approach characterizes the vertical variations of physical properties in each 3-D map unit. In this example, an alluvial deposit (unit "a") overlies glacial till (unit "t"), and the stack-unit labeled "a/t" indicates that relationship, whereas the unit "t" indicates that glacial till extends down to the specified depth. In a manner similar to that shown in figure 11, the stack-unit's occurrence (the map unit's outcrop), geometry (the map unit's boundaries), and descriptors (the physical properties of the geologic units included in the stack-unit) are managed as they are for a typical 2-D geologic map. * ''Raster-based stacked surfaces'' depict the surface of each buried geologic unit, and can accommodate data on lateral variations of physical properties. In this example from Soller and others (1999),D.R. Soller et al. (1999). "Inclusion of digital map products in the National Geologic Map Database". In Soller, D.R., ed., ''Digital Mapping Techniques '99—Workshop Proceedings''. U.S. Geological Survey Open-File Report 99-386, p. 35–38, the upper surface of each buried geologic unit was represented in raster format as an ArcInfo Grid file. The middle grid is the uppermost surface of an economically important aquifer, the Mahomet Sand, which fills a pre- and inter-glacial valley carved into the bedrock surface. Each geologic unit in raster format can be managed in the data model, in a manner not dissimilar from that shown for the stack-unit map. The Mahomet Sand is continuous in this area, and represents one occurrence of this unit in the data model. Each raster, or pixel, on the Mahomet Sand surface has a set of map coordinates that are recorded in a GIS (in the data model bin that is labeled "pixel coordinates", which is the raster corollary of the "geometry" bin for vector map data). Each pixel can have a unique set of descriptive information, such as surface elevation, unit thickness, lithology, transmissivity, etc.).


See also

* ArcGIS * Data structure


References


Further reading

* B.R. Johnson et al. (1998). ''Digital geologic map data model''. v. 4.3: AASG/USGS Data Model Working Group Report, http://geology.usgs.gov/dm/. * Soller, D.R., Berg, T.M., and Wahl, Ron (2000). "Developing the National Geologic Map Database, phase 3—An online, "living" database of map information". In Soller, D.R., ed., ''Digital Mapping Techniques '00—Workshop Proceedings:'' U.S. Geological Survey Open-File Report 00-325, p. 49–52, http://pubs.usgs.gov/openfile/of00-325/soller4.html. * Soller, D.R., and Lindquist, Taryn (2000). "Development and public review of the draft "Digital cartographic standard for geologic map symbolization". In Soller, D.R., ed., ''Digital Mapping Techniques '00—Workshop Proceedings:'' U.S. Geological Survey Open-File Report 00-325, p. 43–47, http://pubs.usgs.gov/openfile/of00-325/soller3.html. {{DEFAULTSORT:Data Model (Gis) Data modeling Geographic information systems